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Abstract 

Designing efficient concurrent objects often requires aban¬ 
doning the standard specification technique of linearizabil- 
ity in favor of more relaxed correctness conditions. However, 
the variety of alternatives makes it difficult to choose which 
condition to employ, and how to compose them when using 
objects specified by different conditions. 

In this work, we propose a uniform alternative in the form 
of Hoare logic, which can explicitly capture—in the aux¬ 
iliary state—the interference of environment threads. We 
demonstrate the expressiveness of our method by verify¬ 
ing a number of concurrent objects and their clients, which 
have so far been specified only by non-standard conditions of 
concurrency-aware linearizability, quiescent, and quantita¬ 
tive quiescent consistency. We report on the implementation 
of the ideas in an existing Coq-based tool, providing the first 
mechanized proofs for all the examples in the paper. 

1. Introduction 

Linearizability [26] remains the most well-known correct¬ 
ness condition for concurrent objects. It works by relating a 
concurrent object to a sequential behavior. More precisely, 
for each concurrent history of an object, linearizability re¬ 
quires that there exists a mapping to a sequential history, 
such that the ordering of matching call/return pairs is pre¬ 
served either if they are performed by the same thread, or if 
they do not overlap. As such, linearizability has been used 
to establish the correctness of a variety of concurrent ob¬ 
jects such as stacks, queues, sets, locks, and snapshots—all 
of which have intuitive sequential specs. 

However, as argued by Shavit [43], efficient paralleliza¬ 
tion may require the development of concurrent objects that 
are inherently non-linearizable: in the presence of interfer¬ 
ence, such objects exhibit behavior that is not reducible to 
any sequential behavior via linearizability. To reason about 
such objects, a variety of novel conditions has been devel¬ 
oped: concurrency-aware linearizability (CAL) [22], quies¬ 
cent consistency (QC) [3, 10], quasi-linearizability (QL) [1], 
quantitative relaxation [24], quantitative quiescent consis¬ 
tency (QQC) [29], and local linearizability [21], to name a 
few. These conditions, formulated as relations on execution 


traces, specify a program’s behavior under concurrent inter¬ 
ference. Some, such as QC, devote special treatment to the 
sequential case, qualifying the behavior in the quiescent (i.e., 
interference-free) moments. 

This proliferation of alternative conditions is problematic, 
as it makes all of them non-canonical. For any specific exam¬ 
ple, it is difficult to determine which condition to use, or if a 
new one should be developed. Worse, each new condition re¬ 
quires a development of its own dedicated program logic or 
verification tool. Furthermore, it is unclear how to combine 
the conditions/logics/tools, when different ones have been 
used for different subprograms. Finally, having criteria de¬ 
fined semantically, e.g., in terms of execution traces, makes 
it challenging to employ them directly for reasoning about 
clients of the corresponding data structures. 

1.1 Concurrency specification via program logics 

In this paper, we propose an alternative, uniform, approach: 
a Hoare logic equipped with special subjective kind of aux¬ 
iliary state [33] that makes it possible to name the amount 
of concurrent interference, and relate it to the program’s 
inputs and outputs directly, without reducing to sequen¬ 
tial behavior. We use Fine-grained Concurrent Separation 
Logic (FCSL) [36], which has been designed to reason about 
higher-order lock-free concurrent programs, and has been re¬ 
cently implemented as a verification tool on top of Coq [41], 
but whose ability to address non-linearizable programs has 
not been observed previously. 

More specifically, subjective auxiliary state permits that 
within a spec of a thread, one can refer to the private state 
(real or auxiliary) of other interfering threads in a local man¬ 
ner. This private state can have arbitrary user-specified struc¬ 
ture, as long as it satisfies the properties of a partial com¬ 
mutative monoid (PCM). A particularly important PCM is 
that of time-stamped histories, which has previously been 
applied to linearizable objects [42], where it replaced call/re¬ 
turn histories. A (logically) time-stamped history consists of 
entries of the form f i—> a, signifying that an atomic behavior 
a occurred at a time (or linearization point) t. A subjective 
specification further distinguishes the histories of the thread 
and its interfering environment, and usefully relates both to 
the thread’s input and output. 
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Of course, Hoare-style reasoning about histories is a nat¬ 
ural idea, exploited recently in several works [4, 17, 19, 23]. 
Here, however, we rely on the unifying power of PCMs, in 
combination with subjective specifications, to show that by 
generalizing histories in different ways—though all subject 
to PCM laws—we can capture the essence of several dif¬ 
ferent conditions, such as CAL, QC and QQC in one-and- 
same off-the-shelf logical system and tool. More precisely, 
our histories need not merely identify a point at which an 
atomic behavior logically occurred, but can also include in¬ 
formation about interference, or lack thereof. Moreover, we 
will use generic FCSL constructs for delimiting the scope of 
auxiliary state, to reason about quiescent moments. 

1.2 Contributions and outline 

The ability to use FCSL for specifying and verifying lin- 
earizable objects (e.g., fine-grained stacks and atomic snap¬ 
shots) has been recognized before [42]. In contrast, the main 
conceptual contribution of this work is an observation that 
the very same abstractions provided by FCSL are sufficient 
to ascribe non-trivial non-linearizable objects with specs 
that can hide object implementation details, but are suffi¬ 
ciently strong to be used in proofs of concurrent client pro¬ 
grams, as we demonstrate in Section 2. Specifically, we rec¬ 
ognize that auxiliary histories can be subject of user-defined 
invariants beyond mere adherence to sequential executions 
{e.g., be concurrency-aware [22]), and can be used to cap¬ 
ture intermediate interference, allowing for quantitative rea¬ 
soning about outcomes of concurrent executions {e.g., in the 
spirit of QQC [29]). These observations, surprisingly, en¬ 
abled reasoning about non-linearizable data structures and 
their clients, which were never previously approached from 
the perspective of program logics or mechanically verified. 

In this unified approach based on program logic, it seems 
inherently impossible (and contrary to the whole idea) to 
classify Hoare triples as corresponding to this or that cor¬ 
rectness condition. Thus, instead of providing theorems that 
relate Hoare triples to existing conditions, we justify the ad¬ 
equacy of our approach by proof-of-concept verifications of 
concurrent objects and their clients. 

Hence, as key technical contributions, we present subjec¬ 
tive specs and the first mechanized proofs (in Coq) of (1) 
an elimination-based exchanger [40] (Section 3), previously 
specified using CAL, and (2) a simple counting network [3] 
(Section 6) that inspired definitions of QC and QQC. We 
then employ these specs to verify client programs (Sec¬ 
tions 5 and 7). We discuss alternative design choices for 
specs and further applications of our verification approach 
in Section 8, and summarize our mechanization experience 
in Section 9. Section 10 compares to related work and Sec¬ 
tion 11 concludes. 

2. Main Ideas and Overview 

We begin by outlining the high-level intuition of our spec¬ 
ification approach, and summarize the main formalization 


steps. As the first motivating example, we consider the con¬ 
current exchanger structure from java.util.concurrent 
[15, 40]. The main purpose of the exchanger is to allow 
two threads to efficiently swap values in a non-blocking way 
via a globally shared channel. The exchange might fail, if a 
thread trying to swap a value does not encounter a peer to do 
that in a predefined period of time. 

For instance, the result of the two-thread program 



Ti := exchange 1 || ^2 := exchange 2 
can be described by the following assertion;' 

Cl = r 2 = None V ri = Some 2 A r 2 = Some 1 (2) 

That is, ri and r 2 store the results of the execution of sub¬ 
threads Ti and T 2 correspondingly, and both threads either 
succeed, exchanging the values, or fail. The ascribed out¬ 
come is only correct under the assumption that no other 
threads besides Ti and T 2 attempt to use the very same ex¬ 
change channel concurrently. 

Why is the exchanger not a linearizable data structure? 
To see that, recall that linearizability reduces the concurrent 
behavior to a sequential one [26]. If the exchanger were lin¬ 
earizable, all possible outcomes of the program (1) would 
be captured by the following two sequential programs, mod¬ 
elling selected interleavings of the threads Ti and T 2 : 

ri := exchange 1; r 2 ■= exchange 2; 

and (3) 

r 2 := exchange 2; ri := exchange 1; 

However, both programs (3) will always result in ri = 
r 2 = None, as, in order to succeed, a call to the exchanger 
needs another thread, running concurrently, with which to 
exchange values. This observation demonstrates that lin¬ 
earizability with respect to a sequential specification is too 
weak a correctness criterion to capture the exchanger’s be¬ 
havior observed in a truly concurrent context [22]: an ade¬ 
quate notion of correctness for exchange must mention the 
effect of interference. 

Consider another structure, whose concurrent behavior 
cannot be related to sequential executions via linearizability: 

flip2 [x : ptr nat) : nat = { 
a := flip x; 

r (4) 

b := flip x; 

return a-\-b } 

The procedure flip2 takes a pointer x, whose value is ei¬ 
ther 0 or 1 and changes its value to the opposite, twice, via 

* We use ML-style option data type with two constructors. Some and None 
to indicate success and failure of an operation, correspondingly. 
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the atomic operation flip, returning the sum of the previous 
values. Assuming that x is being modified only by the calls 
to f lip 2 , what is the outcome r of the following program? 

r := flip2 x] (5) 

The answer depends on the presence or absence of inter¬ 
fering threads that invoke f lip 2 concurrently with the pro¬ 
gram (5). Indeed, in the absence of interference, f lip 2 will 
flip the value of x twice, returning the sum of 0 and \,i.e., 1 . 
However, in the presence of other threads calling flip 2 in 
parallel, the value of r may vary from 0 to 2 . 

What are the intrinsic properties of f lip 2 to be specified? 
Since the effect of flip 2 is distributed between two inter¬ 
nal calls to flip, both subject to interference, the specifi¬ 
cation should capture that the variation in flip 2 ’s result is 
subject to interference. Furthermore, the specification should 
be expressive enough to allow reasoning under bounded in¬ 
terference. For example, absent interference from any other 
threads besides Ti and T 2 that invoke flip 2 concurrently, 
the program below will always result in r = 2 : 


Ti 


To 


Ti := flip2 a; II r 2 := flip2 x] 
r ■= ri+ r 2 


(6) 


2.1 Abstract histories of non-linearizable objects 

Execution histories capture the traces of a concurrent ob¬ 
ject’s interaction with various threads, and are a central no¬ 
tion for specifying concurrent data structures. For example, 
linearizability specifies the behavior of an object by mapping 
the object’s global history of method invocations and returns 
to a sequence of operations that can be observed when the 
object is used sequentially [26]. However, as we have shown, 
neither exchange nor flip 2 can be understood in terms of 
sequential executions. 

We propose to specify the behavior and outcome of such 
objects in terms of abstract concurrent histories, as follows. 
Instead of tracking method invocations and returns, our his¬ 
tories track the “interesting” changes to the object’s state. 
What is “interesting” is determined by the user, depending 
on the intended clients of the concurrent object. Moreover, 
our specifications are subjective (i.e., thread-relative) in the 
following sense. Our histories do not identify threads by 
their thread IDs. Instead, each method is specified by relating 
two different history variables: the history of the invoking 
thread (aka. self-history), and the history of its concurrent 
environment (aka. other-history). In each thread, these two 
variables have different values. 

For example, in the case of the exchanger, the interesting 
changes to the object’s state are the exchanges themselves. 
Thus, the global history xe tracks the successful exchanges 


in the form of pairs of values, as shown in below: 


Ti 


T 2 


T 2 




T 2 


Ti 

(1,2), (2,1) 

, (4,5) 

, (5,4) 

, (9,8) 
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exchcHige ok exchange ok exchange ok 

The diagram presents the history from the viewpoint of 
thread Ti. The exchanges made by Ti are colored white, 
determining the self-history of Ti. The gray parts are the 
exchanges made by the other threads (e.g., T 2 , T 3 , etc.), and 
determine the other-history for Ti. 

The subjective division between self and other histories 
emphasizes that a successful exchange is actually repre¬ 
sented by two pairs of numbers {x, y) and {y, x), that appear 
consecutively in and encode the two ends of an exchange 
from the viewpoint of the exchanging threads. We call such 
pairs twins. As an illustration, the white entry (2,1) from 
the self-history of Ti, is matched by a twin gray entry ( 1 , 2 ) 
from the other-history of Ti, encoding that Ti exchanging 2 
for 1 corresponds to Tfs environment exchanging 1 for 2. 

The subjective division is important, because it will en¬ 
able us to specify threads locally, i.e., without referring to 
the code of other threads. For example, in the case of pro¬ 
gram ( 1 ), we will specify that Ti, in the case of a successful 
exchange, adds a pair (1, ri) to its self history, where Some ri 
is El’s return value. Similarly, T 2 adds apair (2, r 2 ) to its self 
history, where Some r 2 is T 2 ’s return value. 

On the other hand, it is an important invariant of the ex¬ 
changer object—^but not of any individual thread—that twin 
entries are symmetric pairs encoding different viewpoints of 
the one-and-the-same exchange. This object invariant will 
allow us to reason about clients containing combinations 
of exchanging threads. Taking program (1) as an example 
again, the object invariant will imply of the individual spec¬ 
ifications of Ti and T 2 , that ri must equal 2 , and r 2 must 
equal 1 , if no threads interfered with Ti and T 2 . 

We can similarly employ abstract histories to specify 
flip 2 . One way to do it is to notice that the value of the 
shared counter x will be changing as 0 , 1 , 0 , 1 ,..., and ex¬ 
actly two of these values will be contributed by each call to 
f lip 2 made by some thread. We can depict a particular total 
history xj^ of the f lip 2 structure as follows: 



Ti 

T2 

Ti 

T2 

Ts 

n 


Xr = [ 

1, 

0, 

1, 

0, 

1 , 

0 , . 



Ti.flip2 

The two “white” contributions are made by thread Ti’s call 
to flip 2 , while the rest (gray) are contributions by Ti’s en¬ 
vironment. Since the atomic flip operation returns the com¬ 
plementary (i.e., previous) value of the counter, the overall 
result of Ti ’s call in this case isI-|-l = 0-|-0 = 0 . 

The invariant for the f lip 2 structure postulates the inter¬ 
leaving 0 / 1 -shape of the history and also ensures that the last 
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history entry is x’s current value. This will allow us to reason 
about clients of f lip 2 , such as ( 6 ). In the absence of interfer¬ 
ence, we can deduce that the two parallel calls to f lip 2 have 
contributed four consecutive entries to the history xjf, with 
each thread contributing precisely two of them. For each of 
the two calls, the result equals the sum of the two comple¬ 
mentary values for what the corresponding thread has con¬ 
tributed to the history, hence, the overall sum ri -f r 2 is 2. 

2.2 Hoare-style specifications for exchange and f lip 2 
The above examples illustrate that subjectivity and object 
invariants are two sides of the same coin. In tandem, they al¬ 
low us to specify threads individually, but also reason about 
thread combinations. We emphasize that in our approach, the 
invariants are object-specific and provided by the user. For 
example, we can associate the invariant about twin entries 
with the exchanger structure, but our method will not man¬ 
date the same invariant for other structures for which it is 
not relevant. This is in contrast to using a fixed correctness 
condition, such as linearizability, QC, or CAL, which cannot 
be parametrized by user-defined properties.^ 

Subjective histories can be encoded in our approach as 
auxiliary state [37, 42]. Our Hoare triples will specify how 
programs modify their histories, while the invariants are de¬ 
clared as properties of a chunk of shared state {e.g., resource 
invariants of [37]). With the two components, we will be able 
to describe the effects and results of programs declaratively, 
i.e., without exposing program implementations. 

A semi-formal and partial spec of exchange looks as fol¬ 
lows, with the white/gray parts denoting self/other contribu¬ 
tions to history, from the point of view of the thread being 
specified (we postpone the full presentation until Section 3): 

= ]} 

exchange v 

if res is Some w then ) 

X£ = [ ... ,{v,w), ... ] else xe = [■■■ ] ) 

The ellipsis (...) stands for an existentially-quantified chunk 
of the history. The spec (7) says that a successful exchange 
adds an entry {v, w) to the self-history (hence, the entry is 
white). In the case of failed exchange, no entry is added. 
In the complete and formal specification in Section 3, we 
will have to add a timing aspect, and say that the new entry 
appears after all the history entries from the precondition. 
We will also have to say that no entries are removed from the 
other history (i.e., the exchanger cannot erase the behavior 
of other threads), but we elide those details here. 


^ For example, linearizability does not allow users to declare history invari¬ 
ants on a per-object basis. The exchanger example motivated the introduc¬ 
tion of the correctness condition CAL [22], which relaxes linearizability, 
and makes it somewhat more general in this respect, but still falls short of 
admitting user-defined invariants, f lip2 can be specified using a variation 
of QC [29], but we show that a similar property can be expressed via sub¬ 
jectivity and a user-defined invariant. 


The spec of f lip 2 is defined with respect to history xj^' 

= ]} 

flip2 X (8) 

{3ab,XT = [ ■■■ ,a, ■■■ ,b, ... ], res = a -F 6 } 

It says that the return value res is equal to the sum of bi¬ 
nary complements d + 6 for the thread’s two separate self- 
contributions to the history. Due to the effects of the inter¬ 
ference, the history entries a and b may be separated in the 
overall history by the contributions of the environment, as 
indicated by ... between them. 

2.3 Using subjective specifications in the client code 

The immediate benefit of using Hoare logic is that one can 
easily reason about programs whose components use differ¬ 
ent object invariants, whereas there is not much one can say 
about programs whose components require different correct¬ 
ness conditions. For example. Figure 1 shows a proof sketch 
for a toy program that uses both exchange and flip2. As 
each of these methods requires its own auxiliary history vari¬ 
able (x£ for the exchanger, and xf for f lip2), the combined 
program uses both, but the proof simply ignores those histo¬ 
ries that are not relevant for any specific method (i.e., we can 
“frame” the specs (7) and ( 8 ) wrt. the histories of the objects 
that they do not depend upon). 

The program first forks two instances of flip 2 , storing 
the results in ri and r 2 (line 4). Next, two new threads are 
forked, trying to exchange ri and £2 (line 8 ). The conditional 
(line 12 ) checks if the exchange was successful, and if so, 
assigns the sum of exchanged values to t (line 14); otherwise 
t gets assigned 2. We want to prove via the specs (7) and ( 8 ), 
that in the absence of external interference on the flip 2 ’s 
pointer x and the exchanger, the outcome is always t = 2. 
Explaining the verification In addition to the absence of 
external interference, we assume that the initial value of x is 
0 , and the initial self-histories for both f lip 2 and exchange 
are empty (line 1). Once the flip 2 threads are forked, we 
employ spec ( 8 ) for each of them, simply ignoring (i.e., 
framing out) Xf. as this history variable does not apply to 
them f lip 2 . Upon finishing, the postconditions of f lip 2 in 
line 4 capture the relationship between the contributions to 
the history xf and the results ri and £2 of the two calls. 

Both postconditions in line 4 talk about the very same his¬ 
tory XT, just using different colors to express that the contri¬ 
butions of the two threads are disjoint: a and b being white in 
the left thread, implies that a and b are history entries added 
by the left thread. Thus, they must be gray in the right thread, 
as they cannot overlap with the entries contributed by the 
right thread. The right thread cannot explicitly specify in its 
postcondition that a and b are gray, since the right thread is 
unaware of the specific contributions of the left thread. 

Dually, c and d being white in the right thread in line 4, 
implies that they must be gray on the left. Thus, overall, in 
line 5, we know that xt contains all four entries in some 
permutation, and in the absence of intereference, it contains 
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10 

11 

12 

13 

14 


= 0 ,X£ = 0 } 

{X^=[..- ]} 

ri ;= f lip 2 x 

{ Jab,XT = [■■■ ,a, ... ,b, ... ],ri :=a + b } 


{X^ = [--- ]} 
r2 := f lip 2 x 

{ Jcd,XT = [ ■■■ ,c, ... ,d, ... ],r 2 :=c + d } 


{xt = perm(o, b, c, d) = [1,0,1, 0], ri = a + fe, r 2 = c + d} 

{n + r 2 = 2 } 

te = [...]} If {X£ = [•■•]} 

Si exchange ri S2 “ exchange r 2 


if S2 is Some V2 then 

Xe = \... ,{r2,V2), ... ] else xs = [ ... ] 


if Si is Some vi then 

X£ = [ •■• ,(n,ui), ... ] elsexf = [■■•] 

{si = Some V2 AS2 = Some V2 X£ = perm((ri, m), (£2,-02)) = perm((i;i, n), (02, £2))} 

{si = Some £2 A S2 = Some V2 => £1 = £2 A £2 = £1} 
if Si is Some £1 and S2 is Some £2 then 
{£1 = £2, £2 = £i,£i +£2 = 2} 
f := £1 + £2 {t = 2} else t := 2 {t — 2} 


Figure 1. Verification of a concurrent client program using exchange and f lip 2 in the absence of external interference. 


no other entries but these four. From the object invariant on 
Xj^ it then follows that the entries are some permutation of 
[1,0,1, 0], which makes their sum total £1 + £2 = 2. 

Similarly, we ignore xj^ while reasoning about calls to 
exchange via spec ( 7 ) (lines 7 and 9 ). As before, we know 
that the entry (£i,£i), which is white in the left postcon¬ 
dition in line 9 , must be gray on the right, and dually for 
(^ 2 , £ 2 )- In total, the history X£ niust contain both of the en¬ 
tries, but, by the invariant, it must also contain their twins. 
In the absence of any other interference, it therefore must be 
that (£ 1 , £ 1 ) is a twin for (£ 2 , V 2 ), i.e., £1 = V 2 and £2 = £ 1 , 
as line 11 expresses for the case of a succesful exchange. 
The rest of the proof is then trivial. 

The sketch relied on several important aspects of program 
verification in FCSL; (i) the invariants constraining xj' 

X£ were preserved by the methods, (ii) upon joining the 
threads, we can rely on the disjointness of history contribu¬ 
tions of the two threads, in order to combine the thread-local 
views into a specification of the parent thread, and, (Hi) we 
could guarantee the absence of the external interference. 

The aspect (i) is a significant component of what it means 
to specify and verify a concurrent object. As we will show 
in Sections 3 and 6, dehning a sufficiently strong object 
invariant, and then proving that it is indeed an invariant, i.e., 
that it is preserved by the implementation of the program, 
is a major part of the verification challenge. We will explain 
FCSL rules for parallel composition and hiding in Section 4, 
justifying the reasoning principles (ii) and (Hi). 

2.4 Specifying non-linearizable objects in three steps 

As shown by Sections 2.1-2.3, our method for specifying 
and verifying non-linearizable concurrent objects and their 
clients boils down to the following three systematic steps. 


Step 1 (§2.1): Define object-specific auxiliary state and its 
invariants. The auxiliary state will typically include a spe¬ 
cific notion of abstract histories, recording whatever behav¬ 
ior is perceived as essential by the implementor of the object. 
To account for the variety of object-specific correctness con¬ 
ditions, we do not fix a specific shape for the histories. We do 
not restrict them to always record pairs of numbers (as in the 
exchanger), or record single numbers (as in f lip2). The only 
requirement that we impose on auxiliary state in general, and 
on histories in particular, is that the chosen type of auxiliary 
state is an instance of the PCM algebraic structure [42], thus 
providing an abstract, and user-dehned, notion of disjoint¬ 
ness between self/other contributions. 

Step 2 (§2.2): Formulate Hoare-style specifications, para¬ 
metrized by interference, and verify them. This step pro¬ 
vides a suitable “interface” for the methods of the concur¬ 
rent object, which the clients use to reason, without know¬ 
ing the details of the object and method implementations. 
Naturally, the interface can refer to the auxiliary state and 
histories dehned in the previous step. When dealing with 
non-linearizable objects in FCSL, it is customary to formu¬ 
late the spec in a subjective way (i.e., using self/other, du¬ 
ally white/gray division between history entries) so that the 
specification has a way to refer to the effects of the interfer¬ 
ing calls to the same object. The amount of interference can 
be later instantiated with more specihc information, once we 
know more about the context of concurrent threads in which 
the specihed program is being run. 

Step 3 (§2.3): Restrict the interference when using object 
specs for verification of clients. Eventually, thread-local 
knowledge about effects of individual clients of one and 
the same object, should be combined into a cumulative 
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knowledge about the effect of the composition. To measure 
this effect, one usually considers the object in a quiescent 
(interference-free) moment [39]. To model quiescent situa¬ 
tions, FCSL provides a program-level constructor for hiding. 
In particular, hide e executes e, but statically prevents other 
threads from interfering with e, by making e’s auxiliary his¬ 
tory invisible. Program e’s other contribution is fixed to be 
empty, thus modeling quiescence. 

3. Verifying the Exchanger Implementation 

We now proceed with more rigorous development of the 
invariants and specification for the exchanger data struc¬ 
ture, necessary to verify its real-world implementation [15], 
which was so far elided from the overview of the approach. 

The exchanger implementation is presented in ML-style 
pseudo-code in Figure 2. It takes a value v: A and creates 
an ojfer from it (line 2). An offer is a pointer p to two 
consecutive locations in the heap.^ The first location stores v, 
and the second is a “hole” which the interfering thread tries 
to fill with a matching value. The hole is drawn from the 
type hole = U I R I M ru. Constructor U signals that the offer 
is unmatched; R that the exchanger retired {i.e., withdrew) 
the offer, and does not expect any matches on it; and M w 
that the offer has been matched with a value w. 

The global pointer g stores the latest offer proposed for 
matching. The exchanger proposes p for matching by mak¬ 
ing g point to p via the atomic compare-and-set instruction 
CAS (line 3). We assume that CAS returns the value read, 
which can be used to determine if it failed or succeeded. If 
CAS succeeds, exchanger waits a bit, then checks if the offer 
has been matched by some w (lines 6, 7). If so. Some w is 
returned (line 7). Otherwise, the offer is retired by storing R 
into its hole (line 6). Retired offers remain allocated (thus, 
exchanger has a memory leak) in order to avoid the ABA 
problem, as usual in many concurrent structures [25, 47]. If 
the exchanger fails to link p into g in line 3, it deallocates 
the offer p (line 10), and instead tries to match the offer cur 
that is current in g. If no offer is current, perhaps because an¬ 
other thread already matched the offer that made the CAS in 
line 3 fail, the exchanger returns None (line 12). Otherwise, 
the exchanger tries to make a match, by changing the hole 
of cur into M v (line 14). If successful (line 16), it reads the 
value w stored in cur that was initially proposed for match¬ 
ing, and returns it. In any case, it unlinks cur from g (line 
15) to make space for other offers. 

3.1 Step 1: defining auxiliary state and invariants 

To formally specify the exchanger, we decorate it with aux¬ 
iliary state. In addition to histories, necessary for specifying 
the observable behavior, the auxiliary state is used for cap¬ 
turing the coherence constraints of the actual implementa¬ 
tion, e.g., with respect to memory allocation and manage¬ 
ment of outstanding offers. The state is subjective as de- 

^ In our mechanization, we simpiify a bit by making p point to a pair instead. 


1 exchange (u : A) : option A = { 

2 p t— alloc (v, U); 

3 6 t— CAS (p,null,p); 

4 if & == null then 

5 sleep (50); 

6 a; t-CAS (p+1, U, R); 

7 if a: is M ui then return (Some w) 

8 else return None 

9 else 

10 deallocp; 

11 cur <— read g; 

12 if cur == null then return None 

13 else 

14 a: 3 —CAS (cttr+l, U, M u); 

15 CAS (p, cwr, null); 

16 if a; is U then w t— read cur; return (Some w) 

17 else return None} 


Figure 2. Elimination-based exchanger procedure. 

scribed in Section 2: it keeps thread-local auxiliary variables 
that name the thread’s private state (self), but also the private 
state of all other threads combined (other). 

The subjective state of the exchanger for each thread in 
this example consists of three groups of two components: (1) 
thread-private heap hs of the thread, and of the environment 
ho, (2) a set of outstanding offers tts created by the thread, 
and by the environment ttq, and (3) a time-stamped history 
of values xs that the thread exchanged so far, and dually xo 
for the environment. In Section 2, we illustrated subjectivity 
by means of histories, white we used white and gray entries, 
respectively, to describe what here we name xs xo> 
respectively. Now we see that the dichotomy extends beyond 
histories, and this example requires the dichotomy applied to 
heaps, and to sets of offers as well. In addition to self/other 
components of heaps, permissions and histories, we also 
need shared (dka. joint) state consisting of two components: 
a heap hj of storing the offers that have been made, and 
a map mj of offers that have been matched, but not yet 
collected by the thread that made them. 

Heaps, sets and histories are all PCMs under the operation 
of disjoint union, with empty heap/set/history as a unit. We 
overload the notation and write x^v for a singleton heap 
with a pointer x storing value v, and f i—> a for a singleton 
history. Similarly, we apply disjoint union U and subset C, 
to all three types uniformly. 

We next describe how the exchanger manipulates the 
above variables. First, /ij is a heap that serves as the “stag¬ 
ing” area for the offers. It includes the global pointer g. 
Whenever a thread wants to make an offer, it allocates a 
pointer p in hs, and then tries to move p from hs into hj, si¬ 
multaneously linking g to p, via the CAS in line 3 of Figure 2. 

Second, tts and ttq are sets of offers (hence, sets of point¬ 
ers) that determine offer ownership. A thread that has the 


6 


2016 / 7/22 



offer p G TTs is the one that created it, and thus has the sole 
right to retire p, or to collect the value that p was matched 
with. Upon collection or retirement, p is removed from tts- 
Third, xs and xo are exchanger-specific histories, each 
mapping a time-stamp (isomorphic to nats), to a pair of ex¬ 
changed values. A singleton history t >—>■ {v,w) symbolizes 
that a thread having this singleton as a subcomponent of xs, 
has exchanged v for w at time t. As we describe below, the 
most important invariant of the exchanger is that each such 
singleton is matched by a “symmetric” one to capture that 
another thread has simultaneously exchanged w for v. Clas¬ 
sical linearizability cannot express this simultaneous behav¬ 
ior, making the exchanger non-linearizable. 

Fourth, mj is a map storing the offers that were matched, 
but not yet acknowledged and collected. Thus, dom rrij = 
TTS U ttq. a singleton entry in mj has the form p i—)■ (f, v, w) 
and denotes that offer p, initially storing v, was matched at 
time t with w. A singleton entry is entered into mj when a 
thread on the one end of matching, matches v with w. Such 
a thread also places the twin entry 11 —7> {w, v), with inverted 
order of v and w, into its own private history xs, where: 

-_r t + \ iffis odd 

1 iff >0 and t is even 

For technical reasons, 0 is not a valid time-stamp, and has 
no distinct twin. The pending entry for p resides in toj until 
the thread that created the offer p decides to “collect” it. 
It removes p from mj, and simultaneously adds the entry 
1 1 —>■ {v, w) into its own xs^ thereby logically completing the 
exchange. Since twin time-stamps are consecutive integers, 
a history cannot contain entries between twins. 

Thus, two twin entries in the combined history includ¬ 
ing xs^ Xo and Toj, jointly represent a single exchange, as 
if it occurred atomically. Concurrency-aware histories [22] 
capture this by making the ends of an exchange occur as 
simultaneous events. We capture it via twin time-stamps. 
More formally, consider x = Xs kJ xo W ||mj||. Then, the 
exchanger’s main invariant is that x always contains match¬ 
ing twin entries: 

1{v,w) C X 1 1 —)■ (w, v) C X (9) 

Here ||mj|| is the collection of all the entries in mj. That is, 
||0|| = 0, and \\p >-)• (f, v, w) U mj'|| = f >-)• (u, w) U ||toj'||. 

In our implementation, we prove that atomic actions, such 
as CAS, preserve the invariant, therefore, the whole program, 
being just a composition of actions, doesn’t violate it. 

3.2 Step 2: Hoare-style specification of Exchanger 

We can now give the desired formal Hoare-style spec. 

{hs = 0,7rs = 0,xs = 0,?? C xo W ||mj||} 
exchange v 

/is = 0,7rs = 0,7? c xo w ||mj||, 

if res is Some w then 

3t. Xs = / (77, w), lastjp) < t,t else Xs = 0 


The precondition says that the exchanger starts with the 
empty private heap hs, set of offers tts and history xsi hence 
by framing, it can start with any value for these compo¬ 
nents.'^ The logical variable rj names the initial history of all 
threads, xo W ||toj||, which may grow during the call, thus, 
we use subset instead of equality to make the precondition 
stable under other threads adding entries to xo or wj. 

In the postcondition, the self heap hs and the set of offers 
TTS didn’t change. Hence, if exchamge made an offer during 
its execution, it also collected or retired it by the end. The 
history p is still a subset of the ending value for xo W 
I Toj II , signifying that the environment history only grows by 
interference. We will make a cmcial use of this part of the 
spec when verifying a client of the exchanger in Section 5. 

If the exchange fails (i.e., res is None), then xs remains 
empty. If it succeeds (either in line 7 or line 16 in Figure 2), 
i.e., if the result res is Some w, then there exists a time-stamp 
t, such that self-history xs contains the entry t 1 —)■ {v,w), 
symbolizing that v and w were exchanged at time t. 

Importantly, the postcondition implies, by invariant (9), 
that in the success case, the twin entry i 1 —>■ {w, v) must be¬ 
long to Xo WII rrij I , i. e., another thread matched the exchange 
(this was made explicit by the spec (7)). Moreover, the ex¬ 
change occurred after the call to exchange: whichever rj we 
chose in the pre-state, both t and t are larger than the last 
time-stamp in rj. 

The proof outline for the exchanger is available in Ap¬ 
pendix A. In Section 5, after introducing necessary FCSL 
background, we will illustrate Step 3 of our method and 
show how to employ the subjective Hoare spec (10) for mod¬ 
ular verification of a concurrent client. 

4. Background on FCSL 

In order to formally present Step 3 of our method, we first 
need to introduce some important parts of FCSL. 

A Hoare specification in FCSL has the form {P} e {Q}@TZ. 
P and Q are pre- and postcondition for partial correctness, 
and TZ defines the shared resource on which e operates. The 
latter is a state transition system describing the invariants of 
the state (real and auxiliary) and atomic operations that can 
be invoked by the threads that simultaneously operate on 
that state. We elide the transition system aspect of resources 
here, and refer to [36] for detailed treatment. 

An important secondary role of a resource is to declare 
the variables that P and Q may scope over. For example, 
in the case of exchanger, we use the variables hs,TTs,XS, 
ho, TTQ, Xo, /ij, mj. The mechanism by which the vari¬ 
ables are declared is as follows. Underneath, a resource 
comes with only three variables: as, oq and oj standing for 
abstract self state, other state, and shared (joint) state, but the 


Framing in FCSL is similar to that of separation logic, allowing extensions 
to the initial state that remain invariant by program execution. In FCSL, 
however, framing applies to any PCM-valued state component (e.g., heaps, 
histories, etc.), whereas in separation logic, it applies just to heaps. 
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user can pick their types depending on the application. In the 
case of exchanger, as and ao are triples containing a heap, 
an offer-set and a history. The variables we used in Sec¬ 
tion 2 are projections out of such triples: as = {hs, tts, xs)^ 
and ao = (ho, ttq, xo)- Similarly, aj = (hj,mj). 

It is essential that as and ao have a common type exhibit¬ 
ing the algebraic structure of a PCM, under a partial binary 
operation U. PCMs give a way, generic in 7Z, to dehne the 
inference rule for parallel composition. 

{Pi} ei {gl}@ 7 ^ {Pa} 62 {g2}@P 
{Pi ® Pa} 61 II 62 {[res.l/res]gi ® [res.2/res]g2}@7?. 

( 11 ) 

Here, ® is dehned as follows. 

(Pi @ P 2 )(as,aj,ao) 3x1 a; 2 -as = U 

Pi(xi,aj,X 2 Uao),P 2 (a: 2 ,aj,Xi U ao) 

Thereby, when a parent thread forks ei and ea, then ei 
becomes part of the environment for ea, and vice-versa. This 
is so because the self component as of the parent is split into 
xi and xa; Xi becomes the self part of ei, but xa is added to 
the other part ao of Ci (and symmetrically for ea). 

To reason about quiescent moments, we use one more 
constructor of FCSL: hiding. The program hide e opera¬ 
tionally executes e, but logically installs a resource within 
the scope of e. In the case of the exchanger, hide e starts only 
with private heaps hs and ho, then takes a chunk of heap out 
of hs and “installs” an exchanger in this heap, allowing the 
threads in e to exchange values, hide e is quiescent wrt. ex¬ 
changer, as the typechecker will prevent composing hide e 
with threads that want to exchange values with e. 

The auxiliaries tts, xs, tto, XO^ and hj, mj, belonging to 
the exchanger (denoted as resource £) are visible within 
hide, but outside, only hs persists (denoted as a resource V 
for private state). We elide the general hiding rule [36], and 
just show the special case for the exchanger. 

{P} 6{g}@£: 

{hs = $i(P)} hide 6 {3$2. hs = -haChj), -I>2(g)}@P 

( 12 ) 

Read bottom-up, the rule says that we can install the 
exchanger £ in the scope of a thread that works with V, 
but then we need substitutions and $ 2 ^ to map variables 
of £ ihs,Trs,XS, etc) to values expressed with variables 
from V (hs and ho), ‘hi is an initial such substitution (user 
provided), and the rule guarantees the existence of an ending 
substitution $ 2 . The substitutions have to satisfy a number 
of side conditions, which we elide here for brevity. The most 
important one is that other variable ao = iho,ttOjXo) is 
hxed to be the PCM unit (i.e., a triple of empty sets). Fixing 
ao to unit captures that hide protects e from interference. 

At the beginning of hide e, the private heap equals the 
value that $1 gives to hj (hs — ‘hi(/ij)). In other words, 
the hide rule takes the private heap of a thread, and makes 


it shared, i.e., gives it to the hj component of £. Upon 
hnishing, hide e makes /ij private again. 

In the subsequent text we elide the resources from specs. 

5. Verifying Exchanger’s Client 

We next illustrate how the formally specihed exchanger from 
Section 3 can be used by real-world client programs, and 
how the other component, asserted by the spec to satisfy 77 C 
Xo C ||mj||, is crucial for their verihcation. We emphasize 
that the proof of the client does not see the implementation 
details, which are hidden by the spec ( 10 ). 

While simple, our client is realistic, and has been used 
in j ava.util.concurrent [15]. It is dehned as follows. 
First, the exchanger loops until it exchanges the value, 
exchange ’(v : A) : A = { 
w' t— exchange v\ 

if w' is Some w then return ui else exchange ’ v } 
Next, exchange ’ is iterated to exchange a sequence in order, 
appending the received matches to an accumulator. 

ex_seq (vs, ac : seq A) : seq A = { 
if vs is v.:vs' then 

w t— exchange’ v, ex_seq (vs', snoc ac w) 

else return ac} 

Our goal is to prove, via (10), that the parallel composition 

e = ex_seq ( 7 ;si,nil) |j ex_seq (wsajiiil) 

exchanges vsi and vs2, i.e., returns the pair (vs2, vsi). This 
holds only under the assumption that e runs without interfer¬ 
ence (i.e., quiescently), so that the two threads in e have no 
choice but to exchange the values between themselves. 

We make the quiescence assumption explicit using the 
FCSL hide constructor, as described in Section 4. Thus, we 
establish the following Hoare triple: 

{hs = g null} hide e {g £ dom hs, res = (vs2, fsi)} 

(13) 

It says that we start with a heap where g stores null, and 
end with a possibly larger heap (due to the memory leak), 
but with the result (vs2, vsi). The auxiliaries tts, tto, gs, Vo, 
hj, mj are visible inside hide, but outside, only hs persists. 
Explaining the verification. We illustrate the verihcation 
by listing the specs of selected subprograms. First, the spec 
of exchange ’ easily derives from ( 10 ) by removing the now- 
impossible failing case. 

{hs = 0,7rs = 0,xs = 0,h C Xo W ||raj||} 

exchange’ v 

hs = 0,7rs = 0,p C xo hJ ||mj||, 1 
3 t. Xs =t (v, res), last(p) < t,i ) 

Next, ex_seq has the following spec: 

{hs = 0,7rs = 0,xs = 0} 
ex_seq (vs, nil) 

3 ts. hs = 0, TTS = 0, Xs = zip ts vs res, 1 
grows_notwins te, zip ts res ris C xo W l|mj|| J 
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Here, ts is a list of time-stamps, and zip ts vs ws joins up 
the singleton histories ti-^{v,w), for each t, v, w drawn, 
in order, from the lists ts, vs, ws. The spec says that at 
the time-stamps from ts, ex_seq exchanged the elements of 
vs for those of res. That ts is increasing and contains no 
twins, follows from the spec of exchange ’ which says that 
the time-stamps t and t that populate ts and ts, are larger 
than anything in 77 , and thus only grow with iteration. From 
the same postcondition, it follows that xo W ||mj|| contains 
all the twin exchanges, by invariant (9), as commented in 
Section 2 about the spec for exchange. 

Next, by the FCSL parallel composition rule (Section 4): 

{hs = 0,7rs = 0,xs = 0} 

ex_seq (i;si,nil) || ex_seq (DS2,nil) 

3 isi iS2.grows_notwins tei,grows_notwins iS2, 
hs = 0, TTs = 0, Xs = zip tsi vsi res.l U zip tS 2 VS 2 res. 2, 
zip tsi res.l vsi C zip tS 2 VS 2 res. 2 U xo W ||m-j||, 
zip tS 2 res. 2 VS 2 C zip tsi vsi res.l U xo hJ ||mj||. 

To explain: ts and res from the left and right ex_seq threads 
become tsi, ts 2 , res.l and res. 2, respectively. The values 
of each self component hs, tts, Xs from the two threads are 
joined into the self component of the composition. At the 
same time, the other component xo of the left (resp. right) 
thread equals the sum of xs of the right (resp. left) thread, 
and the xo of the composition. This formalizes the intuition 
that upon forking, the left thread becomes part of the envi¬ 
ronment for the right thread, and vice-versa. 

The postcondition says that the self history of e contains 
both zip tsx vsi res.l and zip ts 2 VS 2 res.2. Thus, wi is ex¬ 
changed for res.l, and vs 2 for res.2. But we further want 
to derive res.l = 7752 and res.2 = usi, i.e., the lists are ex¬ 
changed/or each other, in the absence of interference. 

We next explain how this desired property follows for 
hide e, from the two inequalities in e’s postcondition 

zip tsi res.l vsi C zip ts2 VS2 res.2 Uxo W ||toj||, (14) 
zip ts2 res.2 vs2 ^ zip tsi vsi res.l Uxo W ||toj||. (15) 

Notice that (14) and (15) are ultimately instances of the con¬ 
junct p C Xo W ||mj|| that was part of the specification (10), 
thereby justifying the use of subjective other variables. 

We know that dom mj = tts W ttq (from Section 2), 
that TTs = 0 (from e’s postcondition), and that by hiding, 
TTo = Xo = 0- Thus, towards deriving the postcondition of 
hide e, we simplify (14) and (15) into: 

zip tsi res.l vsi C zip ts 2 VS 2 res.2 
zip ts 2 res.2 vs 2 Q zip tsi vsi res.l 

Because tsi and ts 2 are increasing lists of time-stamps, and 
contain no twins, the above implies ts 2 = tsi. Hence: 


1 getAndlncO : nat = { 

2 6 ^ flip(6aO; 

3 res f etchAndAdd2 (cb) ; 

4 return res } 

Figure 3. Simple counting network 



6. Specifying Counting Networks 

We now show how to use subjective histories to specify an¬ 
other class of non-linearizable objects— counting networks. 
Counting networks are a special case of balancing networks 
introduced by Aspnes et al. [3], themselves building on sort¬ 
ing networks [2], aimed to implement concurrent counters in 
a way free from synchronization bottlenecks. The key idea 
is to decompose the workload between several counters, so 
that each of them is responsible for a disjoint set of val¬ 
ues. A thread trying to increment first approaches the bal¬ 
ancer, which is a logical “switch” that “directs” the thread, 
i.e., provides it with the address of the counter to incre¬ 
ment. The balancers make counting networks’ operations 
non-linearizable, as in the presence of interference the re¬ 
sults of increments might be observed out of order. 

Figure 3 presents a schematic outline and a pseudo¬ 
code implementation of a counting network with a sin¬ 
gle balancer. The implementation contains three pointers: 
the balancer bal, which stores either 0 or 1, thus directing 
threads to the shared pointers cq or ci, which count the even 
and odd values, respectively. Threads increment by call¬ 
ing getAndInc, which works as follows. It first atomically 
changes the bit value of the balancer via a call to atomic 
operation flip (line 2). The flip operation returns the pre¬ 
vious value b of the balancer as a result, thus determining 
which of the counters, cq or Ci, should be incremented. The 
thread proceeds to atomically add 2 to the value of Cf, via 
fetchAndAdd2 (line 3). The old value of Cb is returned as the 
result of the procedure.^ 

Assuming that cq and ci are initialized with 0 and 1, it is 
easy to see that in a single-threaded program, the network 
will behave as a conventional counter; that is, consecutive 
invocations of getAndInc return consecutive nats. However, 
in the concurrent setting, getAndInc may return results out 
of order, as follows. 

Example 6.1. Consider two threads, Ti and T 2 operating 
on the network initialized with bal 1 —> 0, Cb 1 —> b. Ti calls 
getAndInc and executes its line 2 to set bal to 1. It gets sus¬ 
pended, so T 2 proceeds to execute lines 2 and 3, therefore 
setting bal back to 0 and returning 1. While Ti is still sus- 


zip tsi res.l i>si = zip ts2 VS2 res .2 

and thus res.l = 7752 , 7;si = res.2. We omit the remaining 
technical argument that explains how the heap hj, with the 
pointer g, is folded into hs, which ultimately obtains (13). 


^ In the counting network from Figure 3, the balancer itself might seem like 
a contention point. However, the flip operation is much less expensive 
than CAS as a synchronization mechanism. The performance can be further 
improved by constructing a diffracting tree of several balancers [25, § 12.6], 
but we do not consider diffracting trees here. 
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pended, T 2 calls getAndInc again, gets directed to cq, and 
returns 0, after it has just returned 1. 

This out-of-order behavior, however, is not random, and 
can be precisely characterized as a function of the number 
of threads operating on the network [1, 29]. In the rest of 
this section and in Section 7, we show how to capture such 
bounds in the spec using auxiliary state of (subjective) his¬ 
tories in a client-sensitive manner. As a form of road map, 
we list the desired requirements for the spec of getAndInc, 
adapting the design goals of the criteria, such as QC, QQC 
and QL [1,3, 29], which we will proceed to verify formally, 
following Step 1 and Step 2 of our approach, and then em¬ 
ploy in client-side reasoning via Step 3: 

• Rl: Two different calls to getAndInc should return distinct 
results (strong concurrent counter semantics). 

• R2: The results of calls to getAndInc, separated by a pe¬ 
riod of quiescence (i.e., absence of interference), should 
appear in their sequential order (quiescent consistency). 

• R3: The results of two sequential calls Ci and C 2 , in a 
single thread should be out of order by no more than 2 N, 
where N is the number of interfering calls that overlap with 
Cl and C 2 (quantitative quiescent consistency). 

6.1 Step 1: counting network’s histories and invariants 

To formalize the necessary invariants, we elaborate the 
counting network with auxiliary state: tokens (isomorphic 
to nats) and novel interference-capturing histories. 

A token provides a thread that owns it with the right 
to increment an appropriate counter [3]. In our example, 
a thread that performs the flip in line 2 of getAndInc 
will be awarded a token which it can then spend to exe¬ 
cute fetchAndAdd2. Thus, any individual token represents a 
“pending” call to getAndInc, and the set of unspent tokens 
serves as a bound on the out-of-order behavior that the net¬ 
work exhibits. We introduce auxiliary variables for the held 
tokens: rs keeps the tokens owned by the self thread, with its 
even and odd projections and Tg, such that rs = Tg U Tg, 
administering access to cq and ci, respectively. Similarly, 
To, featuring the same projections, keeps the tokens owned 
by the other thread. We abbreviate r’ = Tg U Tq for i = 0,1. 

Figure 4 illustrates a network with three even tokens: 

G rO, held by threads that will increment cq, and 
one odd token G r^, whose owner will increment ci. 

A history of the counting network is an auxiliary finite 
map, consisting of entries of the form t 1 —> Such 

an entry records that the value t has been written into an 
appropriate counter (cq or ci, depending on the parity of t), 
at the moment when and held values of Ts even/odd 
projections and respectively. Moreover, in order to 
write t into a counter, the token z was spent by the thread. We 
will refer to z as the spent token. Notice that the entries in the 
history contain tokens held by both self and other threads. 
Thus, a history captures the behavior of a thread subjectively, 
i.e., as a function of the interfering threads’ behavior. 


tokens of 
pending threads 


history of the counter Cq 


current value 

of the balancei^^T'v^ Cf!/' 


0 


history of the counter Cj 

-I 5 i 


1 


Figure 4. Tokens and histories of the simple network 


Similarly to tokens, network histories are represented by 
the auxiliary variables xs> tracking counter updates (even 
and odd) performed by the self thread, and dually xo for the 
other thread. We abbreviate X* = Xs ^ Xb f°r * = 0,1. 

Figure 4 illustrates a moment in network’s history and 
how it relates to the state of the counters. Only 0 has been 
written to cq so far (upon initialization), hence x° only 
contains an entry for t = 0 (we ignore at the moment the 
contents of the history entries). On the other hand, x^ has 
entries for 1 and 3, because after initialization, one thread 
has increased ci. The gray boxes indicate that 0 and 3 are the 
current values of cq and ci, and thus also the latest entries 
in x° and x^, respectively. In particular, these values will 
be returned by the next invocations of fetch And Add2. The 
dashed boxes correspond to the entries to be contributed by 
the currently running threads holding tokens x^, u^. 

In addition to r and x which come in flavors private to self 
and other threads, we require the following shared variables: 
(1) hj for the joint heap of the network, and (2) 6j, Uj and 
n] for the contents of bal, Co and ci, respectively. 

Invariants of the counting network The main invariant of 
the network relates the number of tokens, the size of histories 
and the value of the balancer: 

lx°l + |r°| = |x'l + |r'| + &j (16) 

The equation formalizes the intuition that out-of-order 
anomalies of the counting network appear if one of the 
two counters is too far ahead of the other one. The invari¬ 
ant (16) provides a bound on such a situation. One counter 
can get ahead temporarily, but then there must be a number 
of threads waiting to spend their tokens on the other counter. 
Thus, the other counter will eventually catch up. 

The approaches such as quiescent and quantitative qui¬ 
escent consistency describe this situation by referring to the 
number of unmatched call events in an event history [10, 29]. 
In contrast, we formalize this property via auxiliary state: the 
sets of tokens I recorded in the entry for the number t deter¬ 
mine the environment’s capability to add new history entries, 
and thus “run ahead” or “catch up” after t has been returned. 
The other invariants of the counting network are as follows: 

(i) hj = bal I—)■ 6j U Co I—>■ Uj U Cl I—)■ n]. 

(ii) The histories contain disjoint time-stamps. 

(iii) The history x° (resp. x^) contains all even (resp. odd) 
values in [0,nj] (resp. [l,n]]). This ensures that rij and 
n] are the last time-stamps in x° and x^, respectively. 
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(iv) T°, and spent (xs W xo) contain mutually disjoint 

tokens, where spent {t (i, z) U x') = {z} W spent x^ 

and spent 0 = 0. In other words, a spent token never 

appears among the “alive” ones {i.e., in U r^). 

(v) t ^ {t,z) Cxs^XO z &i. 

(vi) For any t, t, z: 

• t !->■ (t, z) C x° t + 2 |tnr°| < n] + 2 |tnr^| + 2, and 

• t !->■ (t, z) C x^ => t + 2 |t n r^l < rij + 2 |t n r°| + 2. 

The invariant (vi) provides quantitative information about 
the network history by relating the actual (uj, n]) and the 
past (t) counter values, via the current amount of interfer¬ 
ence (t) and the snapshot interference (t). To explain (vi), 
we resort to the intuition provided by the following equality, 
which, however, being not quite valid, cannot be used as an 
invariant, as we shall see. Focusing on the first clause in (vi), 
if f I— >■ (t, z) C x°, then, intuitively; 

t + 2 |i:° \ t“| -f 2 |f n r°| = n] -f 2 |t n t^| + (26j - 1) 

The equality says the following. When t is snapshot from 
Co and placed into the history x°, the set of outstanding even 
tokens was By the present time, cq has been increased 
I \ I times, each time by 2, thus Uj = t + 2 16° \ |. What 

is left to add to cq to reach the period of quiescence, when 
no threads interfere with us, is 2 |f n t°|. Similar reasoning 
applies to ci. It is easy to see at the period of quiescence, 
Co and ci differ by 2bj — 1; that is, the counter pointed to 
by bal is behind by 1. However, the equality is invalid, as bj 
can be read off only in the present, whereas the “intuitive” 
reasoning behind the equality requires a value of bj from 
a quiescent period in the future. Hence, in order to get a 
valid property, we bound 25j — 1 by 2. For simplicity, we 
even further weaken the bounds by dropping \ r°| to 
obtain (vi); as it will turn out, even such a simpler bound 
will suffice for proving R1-R3. 

Allowed changes in the counting network The state of the 
counting network (auxiliary and real) can be changed in two 
possible ways by concurrent threads. These changes formal¬ 
ize the way the atomic operations flip and fetchAndAdd2 
from Figure 3 (b) work with auxiliary state. Flipping alters 
the bit value bj of bal to the complementary one, 1 — 6j. It 
also generates a token z (of parity &j) and stores it into rs. 
The token is fresh, i.e., distinct from all alive and spent to¬ 
kens in rs U To U spent (xs kJ xo)- Incrementation spends 
a token z from rs, and depending on its i, it atomically in¬ 
creases the value nj of Ci by two, while simultaneously re¬ 
moving z from rs (thus, the precondition is that z G rs). It 
also adds the entry (nj + 2) i—>■ (r° U r^,z®) to xs. thus 
snapshoting the values of r° and r^. It is easy to check that 
both these allowed changes preserve the state-space invari¬ 
ants (16), (i)-(vi), and that their effect on real state (with 
auxiliary state erased) are those of flip and fetchAndAdd2. 


rs = 0,xs = 7?s,»7o C xo, 1 

io C ro U (spent xo \ spent po),! qo lq J 

getAndInc() 

' 31 z. rs = 0, xs = »?s kJ (res + 2) i->- (t, z), ' 
bo C xo, to kl ro U (spent xo \ spent qo), 
last {qs U qo) < res + 2 + 2 |t n to|, 
ResPast (ps kJ po) res t z,T po to 


Figure 5. Hoare-style spec of a simple counting network. 

6.2 Step 2: a Hoare spec for getAndInc 
Figure 5 provides a Hoare-style spec for getAndInc, verified 
in our proof scripts. We use the logical variable t and its vari¬ 
ants to range over token sets, and q to range over histories. 

The precondition starts with an empty token set (rs = 
0), and hence by framing, any set of tokens. The initial 
self-history xs is set to an arbitrary qs-^ The precondition 
records the other components of the initial state as follows. 
First, qo names (a subset of) xo^ to make it stable under 
interference, as in Section 2. Next, we use lq to name the 
(subset of) initially live tokens tq. However, as tq may 
shrink due to other threads spending tokens, simply writing 
to C To is unstable. Instead, we write to ^ ro kJ (spent xo \ 
spent qo) to account for the tokens spent by other threads as 
well. The set tq kJ (spent xo \ spent qo) only grows under 
interference, as new live tokens are generated, or old live 
tokens are spent, making the inclusion of to stable. Indeed, 
one cannot take any arbitrary qo and to to name the other 
components of the initial state. Therefore, we constrain these 
two variables by the invariant I, that relates them to the self¬ 
components of the actual state and to each other according to 
the invariants (ii)-(vi).’ This is natural, since, as we will see 
in Section 7, all clients instantiate qo and lq with the other- 
components of the actual pre-state, respecting (ii)-(vi). 

The postcondition asserts that the final token set rs is also 
empty (i.e., the token that getAndInc generates by flip, is 
spent by the end). The history xs is increased by an entry 
(res + 2) I—)■ (t, z), corresponding to writing the value of 
the result (plus two) into one of the network’s counters, 
snapshoting the tokens of that moment into t, and spending 
the token z on the write, qo is a subset of the new value of 
Xo, and 60 is a subset of the new value of tq kJ (spent xo \ 
spent qo), by the already discussed stability. 

The next inequality describes where the entry for res + 2 
is placed wrt. the pre-state history q = qs^ Vo- V may have 
gaps arising due to out-of-order behavior of the network, and 
res + 2 may fill one such gap. However, there is a bound on 

* Alternatively, we could have also taken XS = 0, kut the clients will require 
generalizing to xs = VS t>y the FCSL’s frame rule 142]. To save space and 
simplify the discussion, we immediately frame wrt. the auxiliary xs- Otir 
examples do not require such client-side framing wrt. rs. 

^ That is, rjQ and lq take the role of xo ^tid tq in invariants (ii)-(vi), with 
rij = last (xs ^VoY- The formal definition of X is in our proof scripts. 
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how far res (and hence res + 2) may be from the tail of rj. We 
express it as a function of to t, derived from the bounds 
in (vi), taking res+2 for t and over-approximating the instant 
value rij of the incremented counter via last (ps U po)- The 
inequality weakens the invariant (vi), making it hold for even 
and odd entries by moving 2 | tH tQ | (for t = 0,1) to the right 
side of < and joining them, since t^ n tj) = 0. 

Finally, the predicate ResPast provides more bounds that 
we will need in the proofs of the client code’s properties. 

ResPast r] res l z = t C tq U (spent xo) W {z} , 

yt t.t {l, —) O rj ^ z ^ L, t < res -P 2 -P 2 (|t n t|) 

When instantiated with p = 77s W rjo, ResPast says the fol¬ 
lowing. The token set t snapshot when res +2 was committed 
to history, is a subset of all the tokens in post-state, including 
the live ones (tq), and spent ones (spent xo W { 2 }). More¬ 
over, if t is an entry in p, with contents (t, —), then: (1) z ^ t, 
because z is a token generated when getAndInc executed 
flip. Hence, 2 ; is fresh wrt. any token-set from the pre-state 
history p; and (2) t and l satisfy the same bounds wrt. res+2, 
as those described for the last history entry and cq. 

How will the spec i ll) be used? The clause xs = ? 7 s W(tes+ 
2) I—p — of (17), in conjunction with the invariant (ii), ensures 
that any two calls to getAndInc, sequential or concurrent, 
yield different history entries, and hence different results. 
This establishes Rl, which we will not discuss further. 

The inequality on last (rjs U po) will provide for R2 in 
client reasoning. To see how, consider the particular case 
when to is empty, i.e., the pre-state is quiescent. In that case, 
the intersection with t is empty, and we can infer that res + 2, 
is larger than either counter’s value in the pre-state. As we 
shall see in Section 7, this captures the essence of QC. 

Finally, the predicate ResPast (18) establishes a bound 
for the “out-of-order” discrepancy between the result res and 
any value t committed to the history in the past, via 2 |t n t|. 
We will further bound this value using the size of t, and the 
inclusion t C tq U spent xo from (18). These bounds will 
ultimately enable us to derive the requirement R3. 

7. Verifying Counting Network’s Clients 

Following Step 3 of our verification method, we now il¬ 
lustrate requirements R2 and R3 from the previous sec¬ 
tion via two different clients which execute two sequential 
calls to getAndInc. Both clients are higher-order, i.e., they 
are parametrized by subprograms, which can be “plugged 
in”. The first client will exhibit a quiescence between the 
two calls, and we will prove that the call results appear 
in order, as required by R2. The second client will experi¬ 
ence interference of a program with a N concurrent calls 
to getAndInc, and we will derive a bound on the results in 
terms of N, as required by R3. 

Both our examples will rely on the general mechanism 
of hiding, presented in Section 4, as a way to logically 
restrict the interference on a concurrent object, in this case, a 


rs = 0,Xs = Vs, Vo Q Xo,3:vo to, 1 

to C TO U (spent Xo \ spent rjo) J 

getAndInc() || a 

3l rii. rs = 0, Xs = Vs^Vi^ (res.l + 2) i-P (t, -) , 
Vo C Xo, to C TO U (spent xo \ spent rjo),! rjo to, 
last ( 77 s U 770 ) < res.l +2 + 2|tnto| 


Figure 6. Parallel composition of getAndInc and in (20). 

counting network, in a lexically-scoped way. To “initialize” 
the counting network data structure, we provide the starting 
values for the shared heap (ho) and for the history (rjo), 
assuming that the initial set of tokens is empty: 

/lo = 0 U Co i-t 0 U Cl !->■ 1 

770 = {0^({0},0),1^ ({!},!)} ^ ^ 

That is, Vo provides the “default” history for the initial values 
0 and 1 of cq and ci, with the corresponding tokens repre¬ 
sented by numbers 0 and 1. As always with hiding, the post¬ 
condition of the hidden program will imply that tq and xo 
are both empty, as there is no interference at the end. 

7.1 Exercising quiescent consistency 

Our first client is the following program Cqc'. 

1 (resi, —) t— (getAndIncO II Cl); 

2 (reS 2 , —) (getAndIncO II 62 ); (20) 

3 return (resi, res 2 ) 

Each of the calls to getAndInc interferes with either ei or 
62 , but in the absence of external interference, the quiescent 
state is reached between the lines 1 and 2. Hence, after 
executing hide Cqc, it should be resi < res 2 , following R2. 

The programs ei and 62 can invoke getAndInc and mod¬ 
ify the counters concurrently with the two calls of Cqc, which 
we capture by giving both the following generic spec: 

{ Xs = 0, T-s = 0, t c TO U spent xo } 

ei (21) 

{ 3771 .xs = 77i,Ts = 0,7 C TO Uspent xo} 

The postcondition allows for a number of increments via 
calls to getAndInc, which is reflected in the addition Vi to 
Xs - However, all such calls are required to he finished by the 
end of Bi (rs = 0). As customary by now, we use the logical 
variable t to name the initial set of other tokens. 

Figure 6 provides a spec for each of the parallel compo¬ 
sitions in the program ( 20 ), proved via the corresponding 
FCSL inference rule for parallel composition (11). The spec 
is very similar to (17) with the differences highlighted via 
gray boxes: (a) the self-history xs is increased by e^’s contri¬ 
bution Vi in addition to the entry, introduced by getAndInc, 
(b) the result of the parallel composition is a pair, but we 
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only constrain its first component res.l, resulting from the 
left subprogram. We also drop the last conjunct with Res Past 
from (17), which we won’t require for this example. 

Next, we use the spec from Figure 6 to specify and verify 
the program Cgc, so far assuming external interference. 

{ Fig. 6’s precondition with rys := Po, Po := Xo, and to := to } // P 
(resi,—) -s— (getAndIncO || ei); 
f 3pi-Ts = 0,xs = hs. ■ • ■ 1 

> ho := XO and to ;= to] 


where = 770 U pi U (resi + 2) i->- — 


(reS2,—) ^ (getAndIncO || €2); 

I 3pi p2 t. Ts = 0, to C ro U (spent xo \ spent po) 
I last (ps U po) < res2 + 2 + 2 |tnto|, ... 

return (resi, res2); // =: res 
{ Q(res.l/resi, res.2/res2) } 


//Q 


We start by instantiating the logical variables ps, po and to 
from Figure 6 with po, current xo and tq, respectively, nam¬ 
ing the obtained precondition P. In the following assertion 
we focus on the clauses constraining ts and xs- To verify the 
second call, we instantiate ps, po and to from Figure 6 with 
Pg = Po U Pi U (resi -(- 2) 1 —>■ —, current xo and tq, corre¬ 
spondingly, obtaining the postcondition, which we name Q. 

The inequality in the postcondition Q gives the bound¬ 
ary on the out-of-order position of res 2 with respect to the 
last value in the history captured in between the two parallel 
compositions. The boundary is given via the size of inter¬ 
section of the two sets of tokens: snapshot (t) and “alive” 
between the calls (to)- Now, to ensure the absence of ex¬ 
ternal interference, we consider the program (hide Cgc). By 
the general property of hiding (Section 4), we know that at 
the hnal state there is no interference, hence tq = 0 and 
Xo = 0 in <5- Therefore, from the set inclusion on to in Q 
(the grayed part), we deduce that to = 0. As a consequence, 
the intersection t n to = 0, so from the inequality we obtain 


last (pg U po) < res.2 -F 2 (22) 

But pg is dehned as (res.1-1-2) i-x- — U..., hence, res.1-1-2 G 
dom Pg, and thus res.l 3- 2 < last pg. Even more: 

res.l 3- 2 < last (pg U po). (23) 


From (22) and (23) follows the result R2: res.l < res.2. 

7.2 Proving quantitative bounds 

We next show how the spec (17) also obtains quantitative 
bounds on the out-of-order anomalies in terms of a number 
of running threads in the following program Cggc'. 


1 resi -h- getAndIncO ; 

2 reS2 getAndIncO ; 

3 return (resi, res2) 


(24) 


The e’s spec says that the number of calls to getAndInc in e 
{i.e., the size of interference e exhibits) is some hxed N\ 


{ts = 0,Xs = Ps} e {3p.Ts = 0,xs = ps U rj, |p| = N} (25) 


{ (17)’s precondition with ps := po, Po := Xo. and lq := tq } 
resi ■<— getAndIncO ; 

3t. TS = 0,xs = Ps, • ■ • 

where pg = po U (resi + 2) i->- (t, —) 

reS2 getAndIncO ; 

{3t t. ResPast(ps U po) res 2 11 ,...} 

3l z. t C TO U (spent xo) hJ { 2 } , 2 ^ t, 
resi 3- 2 < res 2 -f 2 3- 2 |t n t| 

return (resi, res 2 ) // =: res 
{res.l < res.2 3- 2 | to 13 spent xo| } 


Figure 7. Proof outline of sequential composition in (24). 

Our goal is to prove that in the absence of external interfer¬ 
ence for Cggc, resi < res 2 3- 2 (requirement R3). 

We hrst verify the sequential composition of the two calls 
in (24); the proof outline is in Figure 7. As previously, we 
start by instantiating the logical variables ps, po and lq 
from spec (17) with ps, xo and tq, respectively. In the 
assertion, resulting by of the first getAndInc, we keep only 
the clauses involving ts and xs. dropping the rest. To verify 
the second getAndInc call, we instantiate ps, po and lq with 
Ps = ??s 13 (resi 3- 2) !->• (t, —), current xo and tq. 

In the postcondition of the second call to getAndInc, we 
focus on the ResPast (pg 13 po) res 2 t z clause, where t 
is the set of tokens snapshot when contributing res 2 3- 2. 
Unfolding the dehnition of ResPast from (18), we obtain 
t C to 13 spent xo *3 {z}. Also, using (resi 3-2) >->• (t, — ) in 
the implication that the unfolding obtains, we get z ^ i and 

resi 3- 2 <; res2 3- 2 3- 2 |r n t| (26) 

Now we use the following trivial fact to simplify. 

Lemma 7.1. If z G t and 2 ^ t, then |i n t| < |t| — 1. 

Using the invariant (v). Lemma 7.1 derives |1 n < |t| — 1 
after which, the inclusion f C ro 13 spent xo 13 { 2 } leads to 

|tn t| < |to 13 spentxol (27) 

Combined with (26), this gives us resi < res 2 3-2 |to 13 
spent xo|. as shown in Figure 7’s postcondition. In words, 
it asserts that the discrepancy between res.l and res.2 is 
bounded by the size of the tokens, which are either held by 
the interfering threads at the end or are spent. 

Figure 8 shows the proof outline for Cggc via the spec 
from Figure 7. By the parallel composition rule (11), the 
precondition splits into two subjective views, where we send 
the initial history po to the left thread, and the empty history 
to the right thread. The proof from Figure 7 then applies to 
the left thread, and the spec (25) applies to the right one. 
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{ T-s = 0, Xs = »7o, • • ■ } H p 


{rs = 0,Xs = Vo} 
resi getAndIncO ; 
reS2 ^ getAndIncO ; 
return (resi, res 2 ) // =: res 

{res.l < res.2 + 2 |ro U spent xo|} 


{-rs = 0,Xs = 0} 


{3»7-Xs ^vAv\ =N,---} 


H resi := res.l.l,res 2 res.1.2 


{resi < res 2 + 2 |ro U spent (xo W 77 )!} 

{resi < res 2 + 2 |ro U spent xo| + 2 A^} // Q 


Figure 8. Proof outline for the program. 

Final xo of the left thread is the union of xo from the 
joined thread with rj, since the environment of the left thread 
includes the right thread and of the join. Rewriting by this 
property in the postcondition of the left thread gives us the 
post of the joint thread; resi < res 2 + 2 jroUspent (xo^ry)!, 
which we can next simplify into 

resi < res 2 + 2 |to U spent xol + 2 

because spent distributes over U, and jspent pj = \ri\ = N. 
Finally, we restrict the external interference by considering 
(hide eqqc). From the properties of hiding, we deduce that 
To and Xo in Q empty, hence we can simplify into 
resi < res 2 + 2 N, which is the desired result R3. 

8. Discussion 

Reasoning about quantitatively quiescent queues The idea 
of interference-capturing histories, which allowed us to 
characterize the out-of-order discrepancies between the re¬ 
sults of a counting network in Section 6 , can be applied to 
specify other balancer-based data structures, for instance, 
queues [10]. The picture on the right illustrates schemat¬ 
ically a non-linearizable queue [ 10 ], which is built out of 
two atomic queues, Qq and qi, 
and two balancers, bale and 
bald- The balancers are used to 
distribute the workload between 
the two queues by directing the 
threads willing to enqueue and 
dequeue elements, correspondingly. 

One can think of representing the pending enqueue/de¬ 
queue requests to each of the two queues, qq and qi, by two 
separate sets of tokens, as shown in Figure 9. The white and 
gray boxes correspond to the present and dequeued nodes 
in the queue in the order they were added/removed. There¬ 
fore, white elements are those that are currently in the queue. 
Similarly, the white-colored tokens are for enqueueing ele¬ 
ments, so the elements x, y, z and k are going to be added to 
the corresponding atomic queues. Gray-colored tokens cor¬ 
respond to dequeueing capabilities for one or another atomic 



tokens for enqueueing/dequeueing history of the queue 



Figure 9. Tokens and histories of a balancer-based queue. 


queue, distributed among the threads, so the elements c and 
d are going to be removed next, on the expense of the corre¬ 
sponding dequeue tokens. The timestamps of the entries in 
the queue history, omitted from the figure, are created, as el¬ 
ements are being enqueued to qo and qi , and the parity of a 
timestamp corresponds to the atomic queue being changed. 
Thus, there might be “gaps” in the combined queue history 
reminiscent to the gaps in the counter history from Section 6 
(e.g., the gap caused by the absence of an “even” element in 
the combined history right between d and e in Figure 9, as 
indicated by “?”), which will cause out-of-order anomalies 
during concurrent executions. By accounting for the number 
of past and present tokens for enqueueing and dequeueing, 
one should be able to capture the effects of interference and 
express a quantitative boundary on the discrepancy between 
the results, coming out of order. 

How much information to expose in a spec? The specs 
we have proved for concurrent objects in Sections 2 and 6 
allow for efficient compositional reasoning about clients, but 
they are also non-trivial to formulate and verify. Luckily, the 
FCSL way of reasoning provides a flexible solution for the 
compositionality-versus-complexity conundrum [32, §7]. 

In FCSL, it is up to the library implementor to decide, how 
much of implementation-specific insight should go into a 
spec. The amount of such details is determined based on the 
foreseen client scenarios. For instance, we have hidden the 
balancer in the spec (17), but decided keep the exact constant 
2 , which would allow us to derive more precise quantitative 
bounds later (see Section 7.2). However, we could have 
hidden this component too (as well as, for instance, some 
parts of the invariant I), by employing in the specification 
sigma-types (a dependently-typed analogue of existential 
types), provided by FCSL as it’s embedded into Coq [7]. 
We could have also omitted tokens from the spec, therefore, 
reducing the set of derivable client-specific properties to 
Section 6 ’s R1 only. 

9. Mechanization and Evaluation 

In order to assess feasibility of the presented above ideas, we 
have mechanized the specs and the proofs of all the examples 
from this paper, taking advantage of the fact that FCSL 
has been recently implemented as a tool for concurrency 
verification [41] on top of the Coq proof assistant [7]. 

Table 1 summarizes the statistics with respect to our 
mechanization in terms of lines of code and compilation 
times. The examples were proof-checked on a 3.1 GHz Intel 
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Program 

Facts 

Inv 

Stab 

Main 

Total 

Build 

Exchanger (§3) 

365 

1085 

446 

162 

2058 

4m 46s 

Exch. Client (§5) 

258 

- 

- 

182 

440 

57s 

Count. Netw. (§6) 

379 

785 

688 

27 

1879 

12m 23s 

CN Client 1 (§7.1) 

141 

- 

- 

180 

321 

3m 11s 

CN Client 2 (§7.2) 

115 

- 

- 

259 

374 

3m 9s 


Table 1. Mechanization of the examples: lines of code for 
program-specific facts (Facts), resource invariants and tran¬ 
sitions (Inv), stability proofs for desired specs (Stab), spec 
and proof sizes for main functions (Main), total LOC count 
(Total), and build times (Build). The entries indicate the 
components that were not needed for the example. 

Core i7 OS X machine with 16 Gb RAM, using Coq 8.5pl2 
and Ssreflect 1.6 [18]. As the table indicates, a large fraction 
of the implementation is dedicated to proofs of preservation 
of resource invariants (Inv), Le., checking that the actual im¬ 
plementations do not “go wrong”. In our experience, these 
parts of the development are the most tricky, as they require 
library-specific insights to define and reason about auxil¬ 
iary histories. Since FCSL is a general-purpose verification 
framework, which does not target any specific class of pro¬ 
grams or properties, we had to prove problem-specific facts, 
e.g., lemmas about histories of a particular kind (Facts), and 
to establish the specs of interest stable (Stab). Once this 
infrastructure has been developed, the proofs of main proce¬ 
dures turned out to be relatively small (Main). 

Fortunately, trickiness in libraries is invisible to clients, 
as FCSL proofs are compositional. Indeed, because specs 
are encoded as Coq types [41], the substitution principle 
automatically applies to programs and proofs. At the mo¬ 
ment, our goal was not to optimize the proof sizes, but to 
demonstrate that FCSL as a tool is suitable off-the-shelf 
for machine-checked verification of properties in the spirit 
of novel correctness conditions [3, 23, 29]. Therefore, we 
didn’t invest into building advanced tactics [35] for specific 
classes of programs [53] or properties [5, 6, 14, 52], and we 
leave developing such automation for future work. 

10. Related Work 

Linearizability and history-based criteria. The need for 
correctness criteria alternative to linearizability [26], which 
is more relaxed yet compositional, was recognized in the 
work on counting networks [3]. The suggested notion of qui¬ 
escent consistency [44] required the operations separated by 
a quiescent state to take effect in their logical order. A more 
refined correctness condition, quasi-linearizability, imple¬ 
menting a relaxed version of linearizability with an upper 
bound on nondeterminism, was proposed by Afek et al. [1], 
allowing them to obtain the quantitative boundaries simi¬ 
lar to what we proved in Section 7.2. The idea of relaxed 
linearizability was later used in the work on quantitative 
relaxation (QR) [24] for designing scalable concurrent data 


structures by changing the specification set of sequential his¬ 
tories. Most recently, quantitative quiescent consistency has 
been proposed as another criterion incorporating the pos¬ 
sibility to reason about effects of bounded thread interfer¬ 
ence [29]. It is worth noticing that some of these correctness 
criteria are incomparable {e.g., QC and QR [24], QL and 
QQC [29]) hence, for a particular concurrent object, choos¬ 
ing one or another criterion should be justified by the needs 
of the object’s client. Therefore, a suitable correctness con¬ 
dition is essentially “in the eye of the beholder”, as is typical 
in programming, when designing libraries and abstract data 
structures, and the logic-based approach we advocate pro¬ 
vides precisely this flexibility in choosing desired specs. 

Hoare-style specifications of concurrent objects. Hoare- 
style program logics were used with great success to ver¬ 
ify a number of concurrent data structures and algorithms, 
which are much more natural to specify in terms of observ¬ 
able state modifications, rather than via calFreturn histories. 
The examples of such objects and programs include barri¬ 
ers [13, 27], concurrent indices [8], flat combiner [42, 48], 
event handlers [45], shared graph manipulations [38, 41], 
as well as their multiple client programs. The observation 
about a possibility of using program logics as a correct¬ 
ness criterion, alternative to linearizability, has been made in 
some of the prior works [8, 28, 46]. Their criticism of lin¬ 
earizability addressed its inability to capture the state-based 
properties, such as dynamic memory ownership [28]— 
something that linearizability indeed cannot tackle, unless 
it’s extended [20]. However, we are not aware of any prior 
attempts to capture CAL, QC and QQC-like properties of 
concurrent executions by means of one and the same pro¬ 
gram logic and employ them in client-side reasoning. 

Several logics for proving linearizability or, equivalently, 
observational refinement [16, 50], have been proposed re¬ 
cently [34, 48, 51], all employing variations of the idea of 
using specifications as resources, and identifying (possibly, 
non-fixed or non-local) linearization points, at which such 
specification should be “run”. In these logics, after establish¬ 
ing linearizability of an operation, one must still devise its 
Hoare-style spec, such that the spec is useful for the clients. 

Similarly to the way linearizability allows one to replace 
a concurrent operation by an atomic one, several logics have 
implemented the notion of logical atomicity, allowing the 
clients of a data structure to implement application-specific 
synchronization on top of the data structure operations. Log¬ 
ical atomicity can be implemented either by parametrizing 
specs with client-specific auxiliary code [28, 31, 45, 46] or 
by engineering dedicated rules relying on the simulation be¬ 
tween the actual implementation and the “atomic” one [9]. 

Instead of trying to extend the existing approaches for 
logical atomicity to non-linearizable objects (for which the 
notion of atomicity is not intuitive), we relied on a gen¬ 
eral mechanism of auxiliary state, provided by FCSL [36]. 
Specifically, we adopted the idea of histories as auxiliary 
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state [42], which, however, was previously explored in the 
context of FCSL only for specifying linearizable structures. 
We introduced enhanced notation for referring directly to 
histories (e.g., Xo)^ although FCSL’s initial logical in¬ 
frastructure and inference rules remained unchanged. 

Recently, attempts were made to unify the common id¬ 
ioms occurring in a number of concurrency logics in a 
generic framework of Views [11]. However, that result is 
orthogonal to our findings, as Views are a framework for 
proving logics sound, not to prove programs, and this paper, 
we focused on using a particular logic (FCSL) for specifying 
a new class of concurrent data structures. 

In this work, we do not argue that FCSL is the only 
logic capable of encoding custom correctness conditions and 
their combinations, though, we are not aware of any other 
work exploring a similar possibility. However, we believe 
that FCSL’s explicit other subjective state component pro¬ 
vides the most straightforward way to do so. The logics like 
CAP [12] and TaDA [9], from our experience and personal 
communication with their authors, may be capable of imple¬ 
menting our approach at the expense of engineering a much 
more complicated structure of capabilities to encode histo¬ 
ries and their invariants, and “snapshot” interference of an 
environment. Other logics incorporating the generic PCM 
structure [30, 31, 38, 49] might be able to implement our 
approach, although none of these logics provide an FCSL- 
style rule for hiding (12) as a uniform mechanism to express 
explicit quiescence. 

Concurrently with this work, Hemed et al. developed a 
(not yet mechanized) verification technique for CAL [23], 
which they applied to the exchanger and the elimination 
stack. Similarly to our proposal, they specify CAL-objects 
via Hoare logic, but using one global auxiliary history, 
rather than subjective auxiliary state. This tailors their sys¬ 
tem specifically to CAL (without a possibility to incorpo¬ 
rate reasoning about other, non CA-linearizable, concurrent 
structures), and to programs with a fixed number of threads. 
In contrast, FCSL supports dynamic thread creation, and is 
capable of uniformly expressing and mechanically verifying 
several different criteria, with CAL merely a special case, 
obtained by a special choice of PCM. Moreover, in FCSL 
the criteria combine, as illustrated in Section 5, where we 
combined quiescence with CAL via hiding. Hiding is cru¬ 
cial for verifying clients with explicit concurrency, but is 
currently unsupported by Hemed et al. ’s method. 

11. Conclusion and Future Work 

We have presented a number of formalization techniques, 
enabling specification and verification of highly scalable 
non-linearizable concurrent objects and their clients in Hoare- 
style program logics. In particular, we have explored several 
reasoning patterns, all involving the idea of formulating ex¬ 
ecution histories as auxiliary state, capturing the expected 
concurrent object behavior. We have discovered that quanti¬ 


tative logic-based reasoning about concurrent behaviors can 
be done by storing relevant information about interference 
directly into the entries of a logical history. 

We believe that our results help to bring the Hoare-style 
reasoning into the area of non-linearizable concurrent ob¬ 
jects and open a number of exciting opportunities for the 
field of mechanized logic-based concurrency verification. 

For instance, in this paper we have deliberately chosen to 
focus on simple client programs to showcase the specs we 
gave to concurrent libraries. However, any larger program 
incorporating these examples can be verified composition- 
ally in FCSL, out of these clients’ specs, via the substitu¬ 
tion principles of FCSL [36, 41], without the need to deal 
with concepts such as histories and tokens that are specific 
to particular libraries. Given the bounds, which we formally 
proved in Section 7, we believe that the reasoning patterns 
we have described will be useful for mechanical verification 
of larger weakly-synchronized approximate parallel compu¬ 
tations [39], exploiting the QC and QQC-like behavior. 

Furthermore, by ascribing interference-sensitive quanti¬ 
tative specs in the spirit of (17) to relaxed concurrent li¬ 
braries [24], one can assess the applicability of a library im¬ 
plementation for its clients: the clients should tolerate the 
anomalies caused by interference, as long as they can logi¬ 
cally infer the desired safety assertions from a library spec, 
which is fine-tuned for particular usage scenarios. 
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A. Exchanger Invariants and Proof Outline 


Additional exchanger invariants The states in the ex¬ 
changer state-space must satisfy other invariants in addition 
to (9). These properties arise from our description of how 
the exchanger behaves on decorated state. We abbreviate 
with p I—>■ {x; y) the heap p i—>■ a; U p+y. 

(i) hj contains a pointer g and a number of offers p i—> 
iv;x), and g points to either null or to some offer in /ij. 

(ii) XS^ Xo ||mj || contain only disjoint time-stamps. Sim¬ 
ilarly, TTs is disjoint from ttq. 

(iii) All offers in mj are matched and owned by some thread: 
3f.p !->■ {t,v,w) C mj p G TTS U no,p (u; M w) C hj. 

(iv) There is at most one unmatched offer; it is the one linked 
from g. It is owned by someone: p i->- (w; U) C /ij => 

P G TTs U TTO, '->■ P C /ij . . 

(v) Retired offers aren’t owned: p i->- (u; R) c /ij ^ p^ ttsUtto. 

(vi) The outstanding offers are included in the joint heap, i.e., 
if p G TTs U TTO then p G dom /ij. 

(vii) The combined history xs W xo kJ ||mj|| is gapless: if it 
contains a time-stamp t, it also contains all the smaller 
time-stamps (sans 0). 

Explaining the proof outline Figure 10 presents the proof 
outline for the spec (10). We start with the precondition, and 
after allocation in line 2, hs stores the offer p in line 3. 

If CAS at line 4 succeeds, the program “installs” the 
offer; that is, the state (real and auxiliary) is changed 
simultaneously to the modification of g. In particular, p 
is added to tts, and the offer p changes ownership, to 
move from hs to /ij. Since b will be bound to null, this 
leads us to the assertion in line 7. We explain in Section 4 
how these kinds of changes to the auxiliary state, which 
are supposed to occur simultaneously with some atomic 
operation (in this case, CAS), are specified and verified in 
FCSL. The assertion in line 7 further states bounded p v g. 
We do not formally define bounded here (it is in the proof 
scripts, accompanying the paper), but it says that p has 
been moved to /ij, i.e., p i—>■ {v; —) C /ij, and that any 
time-stamp t at which another thread may match p, and 
thus place the entry p into mj, must satisfy 

last(p) < t,t. Intuitively, this property is valid, and stable 
under interference, because entries in toj can be added only 
by generating fresh time-stamps wrt. the collective history 
Xo kJ II mj ||, and p is a subset of it. If CAS in line 4 fails, then 
nothing changes, so we move to the spec in line 15. 

At line 8, CAS succeeds if cc = U, and fails if x = M w. 
Notice that x cannot be R; since we own p G tts, no 
other thread could retire p. If CAS fails, then the offer has 
been matched with w. CAS simultaneously “collects” the 
offer as follows. By invariant (iii), and bounded p v rj, the 
auxiliary map toj contains an entry p {t,v,w), where 
last(p) < t,i. The auxiliary state is changed to remove p 
from Toj, and simultaneously place t {v,w) into xs- If 
CAS succeeds, the offer was unmatched, and is “retired” by 


1 {/is = 0,7rs = 0>XS = C xo k) l|mj||} 

2 p <— alloc (n, U); 

3 {hs =p^ {v, U),7rs = 0,XS = 0,1? C xo kJ ||mj||} 

4 f) ■(—CAS (p, null,p); 

5 if 6 == null then 

6 sleep (50); 

7 {hs = 0,7rs = {p},xs = 0,1? C xo kl ||mj||, bounded p n p} 

8 a; <-CAS (p+1, U, R); 

9 {/is = 0,71-5 = 0,p C xo kJ ||m.j II, 

10 fc = M TO 3t. Xs = i !ast(77) < t, 

11 X = U ^ xs = 0 } 

12 if a: is M ID then retnm (Some w) 

13 else retnrn None 

14 else 

15 {/is = p I-4- (ii; U),7rs = 0, XS = 0,»/ Q xo kJ ||nij||} 

16 deallocp; 

17 {/is = 0,71-S = 0,XS = 0,1? C XO kJ ||inj||} 

18 cur 3— read g; 

19 {/is = 0,71-S = 0,XS = 0,1? C XO kJ IIttijII, 

20 cur = null V cur i—> (ui; — ) C ?ij} 

21 if cur == null then return None 

22 else 

23 {hs = 0,7rs = 0,xs = 0,1? C XO kJ ||mj||,CMr i-3- (id;-) C /ij} 

24 a; 3— CAS(ciir+l, U, M d); 

25 {hs = 0, TTS = 0, »7 C Xo kJ ||mj||, cur i-?- (id; y) Q hj, 

26 a: = U p = M D, 3t. xs = t(i', m), Isstjp) < t, t, 

27 a; A U => XS = 0,1? / U} 

28 CAS (g, cair, null); 

29 {same as above; the state satisfies (iv) because y ^ U} 

30 if a: == U then w 3— read cur; return (Some w) 

31 {/is=0,7rs = 0,i?CxokJ ||nij||, res = Some w, 

32 3t.xs =7 1—7 (id, d), last(T;) < t,t} 

33 else return None} 

34 {/is = 0, TTS = 0, Xs = 0, i? k Xo kJ ||nij||, res = None} 


Figure 10. Proof outline for the exchanger. 

removing p from tts. Lines 12-13 branch on x, selecting 
either the assertion 10 or 11, so the postcondition follows. 

After reading cur in line 18, by invariant (i), we know that 
cur either points to null, or to some offer p i—>■ (w; —) Q hj. 

At line 24, the CAS succeeds if x = U and fails otherwise. 
If CAS succeeded, then it “matches” the offer in cttr; that is, 
it writes M w into the hole of cur, and changes the auxiliary 
state as follows. It takes t to be the smallest unused time- 
stamp in the history X = Xs kJ xo kJ ||?77j||. Thus last(x) < t, 
and because x has even size by invariant (9), t must be odd, 
and hence t < t = t + 1. The t i—)■ {v, w) is placed into 
Xs, giving us assertion 26. To preserve the invariant (iii), 
CAS simultaneously puts the entry p i—J, {t,w,v) into toj, 
for future collection by the thread that introduced offer cur. 
But, we do not need to reflect this in line 26. If the CAS 
fails, the history xs remains empty, as no matching is done. 
However, the hole y associated with cur cannot be U, as then 
CAS would have succeded. Therefore, it is sound in line 28 
to “unlink” cur from g, as the unlinking will not violate the 
invariant (iv), which says that an unmatched offer must be 
pointed to by g. Finally, lines 30 and 33 select the assertion 
26 or 27, and either way, directly imply the postcondition. 
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